We present a general framework for applying machine-learning algorithms tothe verification of Markov decision processes (MDPs). The primary goal of thesetechniques is to improve performance by avoiding an exhaustive exploration ofthe state space. Our framework focuses on probabilistic reachability, which isa core property for verification, and is illustrated through two distinctinstantiations. The first assumes that full knowledge of the MDP is available,and performs a heuristic-driven partial exploration of the model, yieldingprecise lower and upper bounds on the required probability. The second tacklesthe case where we may only sample the MDP, and yields probabilistic guarantees,again in terms of both the lower and upper bounds, which provides efficientstopping criteria for the approximation. The latter is the first extension ofstatistical model-checking for unbounded properties in MDPs. In contrast withother related approaches, we do not restrict our attention to time-bounded(finite-horizon) or discounted properties, nor assume any particular propertiesof the MDP. We also show how our techniques extend to LTL objectives. Wepresent experimental results showing the performance of our framework onseveral examples.
展开▼